Lab Assignment Two: Exploring Image Data¶
Authors¶
- Juliana Antonio
- Xiaona Hang
- Chuanqi Deng
1. Business Understanding¶
This data was collected through Bing images of various car types and uploaded on Kaggle. There are 4,165 images and 7 classes of cars, and all of them are larger than the 20 x 20 pixel size requirement. It is sepearated into two folders: testing and training; with subfolders for each car category. Although there is no information on why the dataset was collected, it is likely for training and evaluating machine learning models for image classification tasks related to automobiles.
Classifying vehicles is of great important in multiple facets of industry, the main one being transport and logistic management companies/systems (also known as Intelligent Transport System, or ITS), for traffic accident investigation, traffic flow monitoring, and autonomous driving, as indicated by this recent review (https://www.mdpi.com/1424-8220/23/10/4832). Typically the method that is use for predicting the features/type of car from an image is through deep learning, such as the convolutional neural networt (CNNs) for vehicle classification, and have seen some success with low-resolution images (https://www.mdpi.com/1424-8220/22/13/4740). Before one can use such highly sophiscated methods, it is imperative to prepare the data and reduce the dimensionality of the data through feature extraction. In this way, we can improve the efficiency of the classification and reduce the computational resources required for the predictive task.
Some real life applications, as mentioned briefly with the ITS, we can utilze the prediction task to identify the automobile type, and then using information about the types of vehicles present in different areas, traffic management authorities can make informed decisions about traffic flow optimization, lane management, and infrastructure planning.
Measures of Success¶
In the application of traffic planning/monitoring with classifying a variety of automobiles on the road, there are a few ways to measure the success of the predictive model. Prediction and accuracy are fundamental, for the traffic managment system to rely on the prediction to make informed decisions on traffic flow optimization. Another important factor for success is having the classification in real-time, with no delay or latency, as well as having recall (or sensitivity (where recall score would represent the performance of the machine learning model trained on features in terms of its ability to correctly classify or detect instances of interest in the images). However, an often missed factor that would be paramount in the success of the machine learning model would be robustness.
For example, let us take a random numpy array, with a training set of 800 synthetic images and 200 testing images, 32 x 32 pixels with 3 color channels (RGB):
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Load and preprocess the dataset
# For demonstration, we'll generate synthetic data
X_train = np.random.rand(800, 32, 32, 3)
y_train = np.random.randint(0, 7, size=800)
X_test = np.random.rand(200, 32, 32, 3)
y_test = np.random.randint(0, 7, size=200)
Next, we can just arbitrarily create and train a convolutional neural network (CNN) model for image classification tasks, with a specific architecture comprising convolutional and fully connected layers, compiled with the Adam optimizer and evaluated using sparse categorical cross-entropy loss and accuracy metrics.
# Define a function to train the classification model
def train_model(X_train, y_train):
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(7, activation='softmax') # Assuming 7 output classes
])
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.1)
return model
Now, we will define a function 'evaluate_robustness' to assess the robustness of a trained classification model. Specifically, it tests the model's robustness using data augmentation, where the test data is augmented with variations in rotation, width, height, and zoom, and evaluates the model's accuracy on the augmented test data.
# Define a function to perform robustness testing
def evaluate_robustness(model, X_test, y_test):
# Evaluate accuracy on original test data
accuracy_original = model.evaluate(X_test, y_test, verbose=0)[1]
print("Accuracy on original test data:", accuracy_original)
# Data augmentation for robustness testing
datagen = ImageDataGenerator(rotation_range=20, width_shift_range=0.1, height_shift_range=0.1, zoom_range=0.1)
augmented_iterator = datagen.flow(X_test, y_test, shuffle=False)
X_augmented, y_augmented = augmented_iterator.next()
accuracy_augmented = model.evaluate(X_augmented, y_augmented, verbose=0)[1]
print("Accuracy on augmented test data:", accuracy_augmented)
# Adversarial attack for robustness testing
epsilon = 0.1 # Perturbation strength
X_adversarial = X_test + epsilon * np.sign(np.random.randn(*X_test.shape))
X_adversarial = np.clip(X_adversarial, 0, 1)
# Evaluate accuracy on adversarial test data
accuracy_adversarial = model.evaluate(X_adversarial, y_test, verbose=0)[1]
print("Accuracy on adversarial test data:", accuracy_adversarial)
model = train_model(X_train, y_train)
# Evaluate robustness
evaluate_robustness(model, X_test, y_test)
Epoch 1/10 23/23 [==============================] - 0s 10ms/step - loss: 1.9510 - accuracy: 0.1500 - val_loss: 1.9489 - val_accuracy: 0.1125 Epoch 2/10 23/23 [==============================] - 0s 7ms/step - loss: 1.9409 - accuracy: 0.1708 - val_loss: 1.9617 - val_accuracy: 0.1125 Epoch 3/10 23/23 [==============================] - 0s 7ms/step - loss: 1.9404 - accuracy: 0.1708 - val_loss: 1.9611 - val_accuracy: 0.1125 Epoch 4/10 23/23 [==============================] - 0s 7ms/step - loss: 1.9367 - accuracy: 0.1708 - val_loss: 1.9566 - val_accuracy: 0.1125 Epoch 5/10 23/23 [==============================] - 0s 7ms/step - loss: 1.9330 - accuracy: 0.1708 - val_loss: 1.9571 - val_accuracy: 0.1250 Epoch 6/10 23/23 [==============================] - 0s 7ms/step - loss: 1.9244 - accuracy: 0.1931 - val_loss: 1.9725 - val_accuracy: 0.1125 Epoch 7/10 23/23 [==============================] - 0s 8ms/step - loss: 1.9123 - accuracy: 0.1778 - val_loss: 1.9666 - val_accuracy: 0.1125 Epoch 8/10 23/23 [==============================] - 0s 7ms/step - loss: 1.8803 - accuracy: 0.2972 - val_loss: 1.9774 - val_accuracy: 0.1250 Epoch 9/10 23/23 [==============================] - 0s 7ms/step - loss: 1.8149 - accuracy: 0.3333 - val_loss: 1.9724 - val_accuracy: 0.1625 Epoch 10/10 23/23 [==============================] - 0s 7ms/step - loss: 1.6859 - accuracy: 0.4347 - val_loss: 2.0202 - val_accuracy: 0.1875 Accuracy on original test data: 0.06499999761581421 Accuracy on augmented test data: 0.15625 Accuracy on adversarial test data: 0.08500000089406967
In this randomly generated data, we can see that these accuracies provide insights into how the model generalizes and performs under various conditions. In this case, the model's performance is relatively low across all datasets, indicating potential issues with generalization and robustness. Further analysis and improvements may be necessary to enhance the model's performance and robustness, and we can utilize this as one of the measures of success.
Dataset source: https://www.kaggle.com/datasets/kshitij192/cars-image-dataset
from collections import defaultdict
from pathlib import Path
import matplotlib.pyplot as plt
import cv2
import numpy as np
from sklearn.decomposition import PCA
from skimage import metrics
from sklearn.preprocessing import LabelEncoder, minmax_scale
images = defaultdict(list)
labels = defaultdict(list)
uniformed_size = (224,224)
# load and resize images
for image_path in Path('data').rglob('*.jpg'):
class_name = image_path.parent.stem
train_test = image_path.parent.parent.stem
img = cv2.resize(cv2.imread(str(image_path)),uniformed_size)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
images[train_test].append(img)
labels[train_test].append(class_name)
# concatenate to two ndarrays
train_imgs = np.concatenate([images['train']])
test_imgs = np.concatenate([images['test']])
# encode string lables to numeric labels
encoder = LabelEncoder()
encoder.fit(labels['test'])
test_labels = encoder.transform(labels['test'])
train_labels = encoder.transform(labels['train'])
print(f"Train: {train_imgs.shape}, Test: {test_imgs.shape}")
Train: (3352, 224, 224, 3), Test: (813, 224, 224, 3)
2.2. Linearize¶
train_imgs = train_imgs.reshape((len(train_imgs), -1))
test_imgs = test_imgs.reshape((len(test_imgs), -1))
print(f"Train: {train_imgs.shape}, Test: {test_imgs.shape}")
Train: (3352, 150528), Test: (813, 150528)
2.3. Visualization¶
plt.figure(figsize=(25,10))
for i, r_idx in enumerate(np.random.randint(0, len(images['train']), 10)):
ax = plt.subplot(2,5,i+1)
ax.imshow(images['train'][r_idx])
3. Data Reduction¶
3.1 PCA¶
def run_PCA_analysis(data, n_components=150, required_component_ratio = 0.8, randomized = False):
# PCA
pca = PCA(n_components, svd_solver="randomized" if randomized else "auto")
pca_data = pca.fit_transform(data)
# Analyze how many components are required to adequately represent the image data.
accumulated_ratio = np.cumsum(pca.explained_variance_ratio_)
ratio_idx = np.argmax(accumulated_ratio >= required_component_ratio)
# ploting
plt.figure(figsize=(20,5))
plt.bar(range(1, n_components+1), pca.explained_variance_ratio_)
plt.xticks(range(1, n_components+1, 3), rotation=70)
plt.yticks(np.arange(0, 0.30, 0.01), [f"{x*100:.1f}%" for x in np.arange(0, 0.30, 0.01)])
plt.xlim(0, n_components+1)
ax = plt.twinx()
ax.set_yticks(np.arange(0, 1.01, 0.05), [f"{x*100:.1f}%" for x in np.arange(0, 1.01, 0.05)])
ax.set_ylim(0,1)
ax.plot(range(1, n_components+1), accumulated_ratio, color='orange')
ax.axvline(x=ratio_idx, color='red', linestyle='--')
ax.axhline(y=required_component_ratio, color='red', linestyle='--')
plt.title("explained variance of each components")
return pca, pca_data
n_components = 150
required_component_ratio = 0.8
original_images = train_imgs
%time pca, pca_data = run_PCA_analysis(original_images, n_components, required_component_ratio);
CPU times: total: 5min 17s Wall time: 33.4 s
The analysis above used PCA to deduct the image demension from 150528 to 150. And then the number of components that are required to adequately represent the image data is determined by a 0.8 threshold. As the red dotted lines shown above, it is clear that the first 112 principal components can adequately represent the images with given threshold.
3.2 Random PCA¶
%time random_pca, random_pca_data = run_PCA_analysis(original_images, n_components, required_component_ratio, randomized = True);
CPU times: total: 5min 23s Wall time: 33.6 s
The randomized PCA produced similar result to PCA. The number of components that are required to adequately represent the image data is 112.
3.3 Comparison between PCA and Randomized PCA¶
def reconstruct_image(trans_obj, low_rep):
# reconstruct images from deducted data
rec_image = trans_obj.inverse_transform(low_rep)
shp = rec_image.shape
rec_image = minmax_scale(rec_image.ravel(), (0,255)).astype(np.uint8)
return rec_image.reshape(shp)
def reconstruct(randomized_PCA=False):
# reconstruct few images and visualize
plt.figure(figsize=(25,8))
for i, r_idx in enumerate(np.random.randint(0, len(original_images), 5)):
ax = plt.subplot(2,5,i+1)
ax.imshow(original_images[r_idx].reshape((*uniformed_size, -1)))
low_rep = random_pca_data[r_idx:r_idx+1] if randomized_PCA else pca_data[r_idx:r_idx+1]
rec_image = reconstruct_image(random_pca if randomized_PCA else pca, low_rep)
ax = plt.subplot(2,5,i+6)
ax.imshow(rec_image.reshape((*uniformed_size, -1)))
# calculate MSE and SSIM
SSIM = metrics.structural_similarity(original_images[r_idx], rec_image.ravel())
MSE = np.mean((original_images[r_idx] - rec_image) ** 2)
ax.set_title(f"SSIM: {SSIM:.2f}, MSE: {MSE:.2f}")
plt.suptitle(('Randomized ' if randomized_PCA else '') +'PCA Reconstruction', fontsize = 20)
# make sure the comparison use same set of images
seed = np.random.randint(0, 9999)
np.random.seed(seed)
reconstruct(randomized_PCA = False)
np.random.seed(seed)
reconstruct(randomized_PCA = True)
def calc_metrics(rec_images):
# calculate SSIM and MSE metrics
SSIMs, MSEs = [], []
for i in range(original_images.shape[0]):
MSE = np.mean((original_images[i] - rec_images[i]) ** 2)
MSEs.append(MSE)
SSIM = metrics.structural_similarity(original_images[i], rec_images[i])
SSIMs.append(SSIM)
return SSIMs, MSEs
# reconstruct all images
rec_images = reconstruct_image(pca, pca_data)
rec_images_random = reconstruct_image(random_pca, random_pca_data)
# calculate SSIM and MSE
SSIMs_pca, MSEs_pca = calc_metrics(rec_images)
SSIMs_random_pca, MSEs_random_pca = calc_metrics(rec_images_random)
# Visualize SSIM and MSE
plt.figure(figsize=(6,8))
ax = plt.subplot(1,2,1)
ax.boxplot([SSIMs_pca,SSIMs_random_pca], showmeans=True, widths = 0.8)
ax.set_xticklabels(['PCA', 'Randomized PCA'])
ax.set_xlabel('SSIM')
#ax.axhline(np.mean(SSIMs_pca))
ax = plt.subplot(1,2,2)
ax.boxplot([MSEs_pca, MSEs_random_pca], showmeans=True, widths = 0.8)
#ax.axhline(np.mean(MSEs_pca))
ax.set_xticklabels(['PCA', 'Randomized PCA'])
ax.set_xlabel('MSE')
Text(0.5, 0, 'MSE')
From the analysis above, the PCA have a slightly better average representation (green triangles) compared to Randomized PCA. The PCA presents a higher similarity on reconstructed images and lower MSE than Randomized PCA. Therefore, PCA is better at representing the images with fewer components. I prefer Randomized PCA. It is more efficient on large datasets.
3.4 Feature extraction¶
Start by calculating gradients
from skimage.io import imshow
from skimage.filters import sobel_h, sobel_v
from skimage.color import rgb2gray
import warnings
warnings.filterwarnings("ignore")
h, w = uniformed_size # As per the initial setup
plt.figure(figsize=(10, 5))
# Select a random image from the training dataset
idx_to_reconstruct = np.random.randint(len(train_imgs))
img = train_imgs[idx_to_reconstruct].reshape((h, w, 3))
plt.subplot(1, 2, 1)
# gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray_img = rgb2gray(img)
imshow(gray_img, cmap='gray') # Display the grayscale image
plt.title('Original Image')
plt.grid(False)
plt.subplot(1, 2, 2)
# Calculate and show the gradient magnitude
gradient_mag = np.sqrt(sobel_v(gray_img)**2 + sobel_h(gray_img)**2)
imshow(gradient_mag, cmap='gray')
plt.title('Gradient Magnitude')
plt.grid(False)
plt.show()
Implement Gabor filter feature extraction
from skimage.filters import gabor
from skimage import img_as_float, transform
def extract_gabor_features(images, resize_dim=(100, 100), frequency=0.6):
gabor_features = []
for img in images:
# Resize image to reduce computation time
small_image = transform.resize(img, resize_dim, anti_aliasing=True)
# Convert to grayscale
gray_img = rgb2gray(small_image)
# Appli Gabor filter
real, image = gabor(gray_img, frequency=frequency)
# Combine the real and imaginary parts or use just one, depending on the application
# Flattening the features
gabor_features.append(np.stack([real, image], axis=-1).flatten())
return np.array(gabor_features)
# Extract Gabor features for training and testing sets
train_gabor_features = extract_gabor_features(images['train'])
test_gabor_features = extract_gabor_features(images['test'])
Apply for Daisy feature extraction
from skimage.feature import daisy
from skimage import img_as_float, transform
def extract_daisy_features(images, resize_dim=(100, 100), step=100, radius=16, rings=2, histograms=6, orientations=8):
daisy_features = []
for image in images:
# Resize image to reduce computation time
small_image = transform.resize(image, resize_dim, anti_aliasing=True)
# Convert image to grayscale as DAISY works on single channel images
gray_image = rgb2gray(small_image)
# Convert image to float type required by DAISY function
image_float = img_as_float(gray_image)
# Extract DAISY features from the image
descs = daisy(image_float, step=step, radius=radius, rings=rings,
histograms=histograms, orientations=orientations, visualize=False)
# Flatten the DAISY descriptors to create a single feature vector per image
daisy_features.append(descs.flatten())
return np.array(daisy_features)
train_daisy_features = extract_daisy_features(images['train'])
test_daisy_features = extract_daisy_features(images['test'])
3.5 Analyze Various Feature Extraction Methods¶
- Visualize the pairwise differences among all extracted features, ordered by class. Find the cloest image based on feature extraction method.
- Build a nearest neighbor classifier to evaluate actual classification performance per each method
from sklearn.metrics.pairwise import pairwise_distances
def imshow(image, title=None):
plt.imshow(image)
plt.title(title)
plt.axis('off')
# Function to find and show closest images based on DAISY features
def show_closest_images(index, dist_matrix, images):
distances = dist_matrix[index].copy()
# Set the distance to itself to infinity to avoid self-selection
distances[index] = np.inf
closest_index = np.argmin(distances)
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
imshow(images[index], "Original Image")
plt.subplot(1, 2, 2)
imshow(images[closest_index], "Closest Image based on DAISY features")
plt.show()
# Show the closest image to the first image in the training set
# First, calculate the pairwise distances between DAISY features of training images
dist_matrix = pairwise_distances(train_daisy_features)
# Then, display the original and closest image in the training set
show_closest_images(2, dist_matrix, images['train'])
# Function to find and show closest images based on Gabor features
def show_closest_images(index, dist_matrix, images):
distances = dist_matrix[index].copy()
distances[index] = np.inf
closest_index = np.argmin(distances)
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
imshow(images[index], "Original Image")
plt.subplot(1, 2, 2)
imshow(images[closest_index], "Closest Image based on Gabor features")
plt.show()
# Calculate the pairwise distances between DAISY features of training images
train_dist_matrix = pairwise_distances(train_gabor_features)
# Then, display the original and closest image in the training set
show_closest_images(2, train_dist_matrix, images['train'])
Nearest Neighbors Classification with each Feature Space¶
# KNN classifier for GRADIENT features
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
# Function to extract gradient magnitude features from a single image
def extract_gradient_features(img, uniformed_size=(224, 224)):
# Reshape and convert the image to grayscale
img = img.reshape((*uniformed_size, 3))
gray_img = rgb2gray(img)
# Calculate gradient magnitudes
gradient_mag = np.sqrt(sobel_v(gray_img)**2 + sobel_h(gray_img)**2)
# Flatten the gradient magnitude matrix to create a feature vector
feature_vector = gradient_mag.flatten()
return feature_vector
# Extract features for all images
train_features = np.array([extract_gradient_features(img)
for img in images['train']])
test_features = np.array([extract_gradient_features(img)
for img in images['test']])
# Split the training data into a smaller train and validation set
X_train, X_val, y_train, y_val = train_test_split(
train_features, train_labels, test_size=0.2, random_state=42)
# Initialize and train the KNN classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
# Predict on the validation set
y_pred = knn.predict(X_val)
# Evaluate the classifier
print("Confusion Matrix:")
print(confusion_matrix(y_val, y_pred))
print("\nClassification Report:")
print(classification_report(y_val, y_pred))
Confusion Matrix:
[[62 0 0 47 6 68 2]
[18 3 0 5 2 19 1]
[17 0 6 9 2 24 6]
[19 0 0 19 0 23 0]
[ 9 0 0 20 18 31 2]
[ 9 0 0 14 4 59 1]
[19 0 0 24 7 65 31]]
Classification Report:
precision recall f1-score support
0 0.41 0.34 0.37 185
1 1.00 0.06 0.12 48
2 1.00 0.09 0.17 64
3 0.14 0.31 0.19 61
4 0.46 0.23 0.30 80
5 0.20 0.68 0.31 87
6 0.72 0.21 0.33 146
accuracy 0.30 671
macro avg 0.56 0.27 0.26 671
weighted avg 0.53 0.30 0.29 671
# KKN classifier for gabor features
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
# Train a Nearest Neighbors classifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(train_gabor_features, labels['train'])
# Predict on the test set
predicted_labels = knn.predict(test_gabor_features)
# Classification report
print(classification_report(labels['test'], predicted_labels))
# Confusion matrix
cm = confusion_matrix(labels['test'], predicted_labels)
sns.heatmap(cm, annot=True, fmt="d")
precision recall f1-score support
Audi 0.35 0.73 0.47 199
Hyundai Creta 0.97 0.58 0.73 67
Mahindra Scorpio 1.00 0.25 0.40 75
Rolls Royce 0.17 0.18 0.17 74
Swift 0.77 0.47 0.59 102
Tata Safari 0.68 0.72 0.70 106
Toyota Innova 0.82 0.39 0.53 190
accuracy 0.51 813
macro avg 0.68 0.47 0.51 813
weighted avg 0.65 0.51 0.52 813
<Axes: >
# KKN classifier for daisy features
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
# Train a Nearest Neighbors classifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(train_daisy_features, labels['train'])
# Predict on the test set
predicted_labels = knn.predict(test_daisy_features)
# Classification report
print(classification_report(labels['test'], predicted_labels))
# Confusion matrix
cm = confusion_matrix(labels['test'], predicted_labels)
sns.heatmap(cm, annot=True, fmt="d")
precision recall f1-score support
Audi 0.48 0.44 0.46 199
Hyundai Creta 0.47 0.64 0.54 67
Mahindra Scorpio 0.50 0.39 0.44 75
Rolls Royce 0.20 0.16 0.18 74
Swift 0.54 0.53 0.53 102
Tata Safari 0.65 0.70 0.67 106
Toyota Innova 0.54 0.59 0.56 190
accuracy 0.51 813
macro avg 0.48 0.49 0.48 813
weighted avg 0.50 0.51 0.50 813
<Axes: >
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
knn_dsy = KNeighborsClassifier(n_neighbors=1)
knn_gabor = KNeighborsClassifier(n_neighbors=1)
knn_dsy.fit(train_daisy_features, train_labels)
acc_dsy = accuracy_score(knn_dsy.predict(test_daisy_features), test_labels)
knn_gabor.fit(train_gabor_features, train_labels)
acc_gabor = accuracy_score(knn_gabor.predict(test_gabor_features), test_labels)
# report accuracy
print(f"Daisy Accuracy:{100*acc_dsy:.2f}%, Gabor accuracy:{100*acc_gabor:.2f}%")
Daisy Accuracy:50.68%, Gabor accuracy:51.05%
Based on the provided output, we can analyze the performance of the three feature extraction techniques: ordered gradients, Gabor features, and DAISY features, and assess their promise for the prediction task.
Data Reduction Analysis
- Gabor Features:
- The classification report shows the precision and recall values are good for most classes, especially for 'Hyundai Creta' and 'Tata Safari'. However, there are still issues with certain classes like 'Mahindra Scorpio' with high precision but very low recall. The overall accuracy has increased to 50%, significantly improving over the ordered gradients method.
- DAISY Features:
- The performance is similar to the Gabor features with an overall accuracy of 51%, which is a slight improvement over the Gabor features. Precision and recall values across different classes are more balanced compared to Gabor features, indicating a slightly better and more consistent classification performance.
- The DAISY feature extraction shows promise as it maintains a balance between precision and recall better than the other two methods.
Nearest neighbor classifier performance using DAISY and Gabor features (with accuracies around 50%) suggests these methods provide features that are somewhat useful for classification, but there might be room for improvement or a need for parameter tuning.
Visual Analysis:
- Gradient Features:
- The image appears clear with distinguishable details of the car and the surroundings.
- This image represents the gradient magnitude of the original image, highlighting the edges and contours of the car and its environment, emphasizing areas of significant intensity changes.
- In gradient magnitude images, bright areas represent high changes in intensity which typically correspond to edges in the image. This is visible around the contours of the car, the license plate, the design elements on the car's body, and the boundaries between different elements in the background.
- This is crucial in various computer vision tasks such as edge detection, object recognition, and feature extraction, as edges often carry important information about the shape and structure of objects. The gradient magnitude feature extraction seems effective in highlighting the structural details of the car, which could be beneficial for tasks requiring edge or shape detection.
- DAISY Features:
- Original Image: Shows a car in a city environment.
- Closest Image based on DAISY features: Displays a car that, while in a different environment, appears to have a similar body shape and style as the original. The angle and positioning also seem quite matched, which might have contributed to it being recognized as the closest image. This indicates that DAISY features are likely to capture and compare geometric patterns and spatial characteristics of the vehicles.
- Gabor Features:
- Original Image: The same original image as used in the DAISY comparison.
- Closest Image based on Gabor features: Presents a different sedan under different environmental conditions but with a somewhat similar body style. The color and setting are markedly different, yet the Gabor features have identified a vehicle with similar textural and edge characteristics. This suggests that while Gabor features might not directly focus on the exact shape, they capture the texture and pattern aspects effectively, leading to a match based on these attributes rather than on color or exact geometry such as environmental background.
4. Exceptional Work - Key Point Matching Using DAISY Features¶
For key point matching, we will use DAISY features but focus on matching individual key points between images instead of comparing whole-image descriptors.
from skimage.feature import match_descriptors, daisy
from skimage import img_as_float, transform, color
def extract_daisy_features_and_keypoints(images, resize_dim=(100, 100), step=180, radius=15,
rings=2, histograms=6, orientations=8):
daisy_descriptors = []
for idx, image in enumerate(images):
print(f'{idx}', end='\r')
small_image = image.reshape((*resize_dim, 3))
gray_image = color.rgb2gray(small_image)
# Extract DAISY features from the image, along with their keypoints
descs = daisy(gray_image, step=step, radius=radius, rings=rings,
histograms=histograms, orientations=orientations)
# Store descriptors and keypoints
daisy_descriptors.append(descs)
return np.array(daisy_descriptors)
# Feature extraction
test_descriptors = extract_daisy_features_and_keypoints(images['test'], (224, 224), 25, 10)
812
resize_dim = (224, 224)
small_image = images['test'][0].reshape((*resize_dim, 3))
# Convert image to grayscale as DAISY works on single channel images
gray_image = color.rgb2gray(small_image)
# Extract DAISY features from the image, along with their keypoints
descs, descs_img = daisy(gray_image, step=25, radius=10,
rings=2, histograms=6, orientations=8, visualize=True)
plt.imshow(descs_img)
<matplotlib.image.AxesImage at 0x1ed9310f4f0>
# Match each pair of descriptors, find pairs have most matched keypoints
cache = {}
image_matches = []
for i in range(test_descriptors.shape[0]):
max_matched_keypoints = -1
max_matched_idx = -1
for j in range(i+1, test_descriptors.shape[0]):
if (j, i) in cache:
num_matched = cache[(j, i)]
else:
test_descriptor1 = test_descriptors[i]
test_descriptor2 = test_descriptors[j]
test_descriptor1 = test_descriptor1.reshape(
(-1, test_descriptor1.shape[-1]))
test_descriptor2 = test_descriptor2.reshape(
(-1, test_descriptor2.shape[-1]))
matches = match_descriptors(test_descriptor1, test_descriptor2)
num_matched = matches.shape[0] # keypoints, 2
if num_matched > max_matched_keypoints:
max_matched_idx = j
max_matched_keypoints = num_matched
cache[(i, j)] = num_matched
image_matches.append((i, max_matched_idx, max_matched_keypoints))
# visualize some pairs
import random
idx = 1
plt.figure(figsize=(20, 5))
for i, j, max_match in random.choices(image_matches, k=10):
plt.subplot(2, 10, idx)
plt.imshow(images['test'][i])
plt.subplot(2, 10, idx+10)
plt.imshow(images['test'][j])
plt.title(f"points matched:\n{max_match}")
idx += 1
The provided image displays pairs of cars with the number of keypoints matched between each pair using DAISY feature extraction and keypoint matching. The numbers of matched points vary between the pairs, indicating differences in similarity or feature correspondence between the images.
From the plot, we can observe:
- The number of matched points varies significantly across different image pairs. This variation suggests that the DAISY keypoint matching method can discriminate between different levels of similarity among the images.
- High numbers of matched points suggest a strong similarity between the images in terms of their key features. This could be due to similarities in car models, angles of photography, or specific features like headlights, grilles, and body shapes that DAISY features effectively capture.
- Lower numbers of matched points suggest that while there are some similarities between the images, there are also significant differences, which could be due to various factors like car model differences, changes in viewpoint, or different image backgrounds.
The effectiveness of keypoint matching using DAISY can be further evaluated by comparing these results against a baseline (2310.08530.pdf (arxiv.org), such as the total number of possible matches, the performance of other feature extraction methods, or the results of using the total DAISY vector without keypoint matching. However, the actual effectiveness would depend on the specific application, such as whether the task is to identify the same model of cars, differentiate between different models, or identify cars under various conditions.
If the objective is to identify or match specific car models under varying conditions, a higher number of keypoints matched could indicate better performance. Conversely, if the objective is to differentiate between different car models, you would expect lower numbers of matched keypoints for different models and higher numbers for the same models.
References:
Kaggle. Car Images Dataset.https://www.kaggle.com/datasets/kshitij192/cars-image-datasetresource=download (Accessed 2-14-2024)
Michael Abebe Berwo, et. al. "Deep Learning Techniques for Vehicle Detection and Classification from Images/Videos: A Survey" Sensors 2023, 23(10), 4832; https://doi.org/10.3390/s23104832
Sumeyra Tas, et. al. "Deep Learning-Based Vehicle Classification for Low Quality Images" Sensors 2022, 22(13), 4740; https://doi.org/10.3390/s22134740
Jie Yang1, et. al. "UNIPOSE: DETECTING ANY KEYPOINT". https://arxiv.org/pdf/2310.08530.pdf (Accessed 02-20-2024)